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Compiling  Smaiitnlk>80  to  a  RISC 


1.  Introduction 

The  goal  of  the  Smalltalk  On  A  RISC  (SOAR)  project  at  U.C.  Berkeley  was  to 
produce  a  high-performance  execution  engine  for  the  Smalltalk-801  language  [l].  The 
heart  of  the  effort  is  a  Berkeley  RISC  processor  extended  to  support  Smalltalk  [2].  The 
processor  was  designed  in  conjunction  with  the  runtime  system  [3],  which  together 
have  yielded  substantial  performance  improvements  over  conventional  Smalltalk-80 
implementations.  An  extensive  performance  evaluation  can  be  found  in  Ungar's  disser¬ 
tation  [4];  SOAR  runs  Smalltalk  roughly  2.5  times  as  fast  as  a  Motorola  68010  with  a 
similar  cycle  time,  which  works  out  to  be  about  the  same  speed  as  the  Xerox  Dorado 
high-performance  workstation.  One  reason  for  the  SOAR  system's  speed  is  its  compila¬ 
tion  of  Smalltalk,  It  has  been  estimated  that  compiling  Smalltalk  to  the  SOAR 
instruction  set  produces  a  factor  of  two  speedup  over  conventional  interpreted  sys¬ 
tems.  This  paper  describes  the  SOAR  compiler,  presenting  the  mechanisms  of  compil¬ 
ing  Smalltalk  to  the  special-purpose  SOAR  RISC  instruction  set. 

2.  The  Nature  of  the  Compiler 

The  Smalltalk-80  language  is  defined  operationally  in  terms  of  a  virtual  machine 
that  executes  stack-based  instructions  called  bytecodes.  The  Smalltalk-80  program¬ 
ming  environment  is  a  binary  image  that  runs  on  the  virtual  machine.  The  problem 
with  the  virtual  machine  is  that  it  is  inefficient  if  naively  implemented.  It  is  com¬ 
monly  realized  as  a  bytecode  interpreter,  which  requires  special  hardware  (such  as  that 
possessed  by  the  Dorado)  to  avoid  interposing  a  layer  of  overhead  between  the  virtual 
machine  and  the  native  machine. 

An  alternative  is  to  compile  bytecodes  to  native  machine  instructions,  an 
approach  successfully  taken  by  Deutsch  and  Scbiffman  [5].  The  Deutsch  and  Schiffman 
system  dynamically  translates  procedures  (called  methods  in  Smalltalk)  as  needed, 
keeping  a  cache  of  native-code  methods  and  flushing  the  least-recently-used  ones.  The 

‘Smalltalk-80  is  a  trademark  of  Xerox  Corporation. 


SOAR  system  takes  this  approach  a  step  further  and  compiles  all  methods  from 
bytecodes  into  SOAR  instructions  -  the  canonical  representation  of  a  method  in  the 
SOAR  system  is  as  SOAR  instructions,  not  bytecodes.  Where  caching  is  done  for  space 
efficiency,  compiling  everything  simplifies  and  speeds  up  the  system  for  a  moderate 
cost  in  space. 

We  considered  compiling  Smalltalk  directly  to  SOAR,  avoiding  bytecodes 
entirely,  but  we  did  not  take  this  path  for  three  reasons.  First,  the  virtual  machine  is 
the  semantic  definition  of  the  language.  This  implies  that  a  correct  bytecode  compiler 
is,  along  with  correct  system  functions,  a  correct  Smalltalk  implementation.  Second, 
keeping  bytecodes  as  an  intermediate  form  permits  mixed-mode  debugging  like  that 
found  in  LISP  systems,  where  functions  to  be  debugged  are  interpreted,  and  tested 
functions  that  should  run  fast  are  compiled.  Third,  the  standard  virtual  image  is  a 
bytecode-based  image;  to  produce  a  SOAR-based  native  image  those  bytecodes  must 
be  translated  to  SOAR  instructions.  Rather  than  develop  two  separate  compilers,  one 
for  taking  bytecodes  to  SOAR  and  another  for  compiling  Smalltalk  to  SOAR,  we 
implemented  one  bytecode  compiler  with  two  implementations:  a  C  version  for  image 
conversion,  and  a  Smalltalk  version  for  use  with  the  converted  image.  Debugging  is 
addressed  in  more  detail  in  Lee's  report  (6],  and  image  conversion  in  a  recent  paper 

[3]- 

The  basic  task  of  the  compiler  is  to  translate  stack-oriented  bytecodes  into 
RISC-style  loads,  stores,  and  other  register- based  instructions.  It  does  this  by  assigning 
Smalltalk  variables  and  stack  locations  to  registers  and  memory  locations,  and  then 
simulating  at  compile  time  the  bytecode  stack  operations,  converting  them  to  SOAR 
operations.  The  simulated  stack  is  used  to  remember  value  sources  and  operations; 
when  a  value  destination  is  encountered  the  code  to  load,  compute,  and  store  the  value 
is  generated.  If  the  Smalltalk  variables  A  and  B  are  assigned  to  registers,  for  example, 
a  push  of  A  and  a  pop-and-store  into  B  is  translated  into  a  register-to-register  move; 
the  simulated  stack  is  used  to  remember  the  source  A  until  the  destination  B  is 
encountered.  The  stack  is  simulated  at  compile  time  to  avoid  unnecessary  computa¬ 
tions  at  runtime. 


In  this  regard,  the  Smalltalk  bytecodes  perform  the  same  function  as  any  stack- 
based  intermediate  language  such  as,  for  example,  UCODE  [7].  Bytecodes  are  unlike 
UCODE  in  that,  as  we  will  discuss  later,  they  restrict  the  compiler  writer,  particularly 
in  implementing  optimizations  that  would  result  in  code  motion  or  that  require  type 
information  to  perform.  For  example,  common  subexpression  elimination  is  exception¬ 
ally  difficult  to  do  in  Smalltalk. 

Given  the  restrictive  semantics  of  the  Smalltalk  bytecodes  and  the  simple  archi¬ 
tecture  of  SOAR,  the  compiler  falls  or  stands  on  its  success  in  mapping  Smalltalk  vari¬ 
ables  to  registers  and  memory. 

3.  SOAR  Register  Windows 

One  feature  of  the  Berkeley  RISC  architectures  is  a  register  file  of  overlapping 
register  windows,  each  window  corresponding  to  a  procedure  activation  frame.  The 
windows  are  allocated  on  procedure  call  in  a  stack  discipline,  using  the  registers  in  the 
window  overlap  to  pass  parameters.  The  advantage  of  register  windows  is  fast  pro¬ 
cedure  call  and  return,  avoiding  the  saving  and  restoring  of  registers.  Tests  with 
benchmarks  indicate  that  SOAR  Smalltalk  would  be  46°o  slower  without  them.2 

It  is  crucial  that  the  size  of  these  register  windows  be  chosen  wisely.  If  a  method 
requires  more  storage  than  can  fit  in  a  window,  the  extra  values  are  spilled  to 
memory,  slowing  down  the  procedure  call  and  the  method's  execution  as  well  as 
increasing  its  code  size.  If  windows  are  too  small,  an  excessive  number  of  methods  will 
spill,  degrading  the  performance  of  the  whole  system.  On  the  other  hand,  if  windows 
are  too  large,  registers  will  be  wasted  and  fewer  windows  can  be  accommodated  on  the 
processor  chip. 

For  SOAR,  the  goal  was  to  have  90 °c  to  95^  of  all  activation  frames  fit  in 
SOAR  windows.  Preliminary  studies  [9]  indicated  that  a  window  size  of  16  registers 
with  complete  overlap  was  best.  The  SOAR  windows  are  smaller  than  those  of  other 
RISC  designs  such  as  RISC  II,  which  has  32,  because  Smalltalk  methods  are 

*  All  result*  quoted  in  this  paper  are  Found  in  either  Ungar's  dissertation  [4|  or  Bush's  report  |8),  un¬ 
less  stated  otherwise. 


correspondingly  smaller  than  procedures  in  more  traditional  imperative-style 
languages. 

An  alternative  would  have  been  to  use  variable  site  windows  at  some  expense  in 
hardware  [10].  For  SOAR,  we  concluded  that  variable  site  windows  would  not  result 
in  significant  enough  improvement  to  offset  the  additional  hardware  complexity. 

The  SOAR  hardware  divides  each  window  into  two  identical  sets  of  8  registers,  a 
high  set  (15-8)  and  a  low  set  (7-0).  Each  set  contains  6  general  purpose  registers  and  2 
dedicated  registers  used  for  return  addresses  (15  and  7)  and  return  values  (14  and  6). 

The  current  method  receives  its  parameters  and  stores  its  local  variables  in  the 
high  register  set.  It  sets  up  parameters  for  any  methods  that  it  calls  in  the  low  set. 
Unlike  other  RISC  designs,  the  SOAR  architecture  has  no  registers  dedicated  solely  for 
local  use  —  all  are  shared  between  two  activation  records.  This  allows  the  compiler 
flexibility  in  register  allocation  —  registers  can  be  used  for  arguments  or  local  values 
depending  on  what  is  appropriate.  Temporaries  that  must  persist  through  calls  to 
other  methods  (‘retained’  temporaries)  are  put  in  free  high  registers.  ‘Transitory’  tem¬ 
poraries  (those  whose  life  spans  do  not  cross  procedure  calls,  such  as  intermediate 
results  from  compiler-generated  expressions)  can  be  put  in  the  lows.  Figure  1  presents 
this  categorization  pictorially. 

Allocating  variables  and  temporaries  to  registers  is  complicated  by  the  fact  that 
it  is  possible  to  write  a  Smalltalk  method  whose  variables  and  temporaries  will  not  all 
fit  in  a  register  window.  When  more  registers  are  required  than  are  available,  we 
resort  to  spilling.  There  are  two  rules  used  to  determine  what  and  when  to  spill  to 
memory.  The  first  rule  of  assignment  by  category  specifies  that  entire  categories  of 
variables  are  spilled  —  if  not  all  of  the  arguments  fit  in  the  registers,  for  example,  all 
are  spilled.  The  second  rule  of  permanent  assignment  means  that  a  variable  cannot  be 
moved  once  it  has  been  allocated  a  location  —  if,  for  example,  a  local  variable  has  been 
put  in  a  register,  it  will  not  later  be  spilled  to  make  room  for  a  temporary.  Neither  of 
these  rules  result  in  minimal  register  usage  or  minimal  memory  traffic,  but  they  are 
reasonable  and  simple.  Since  a  major  goal  of  the  SOAR  architectural  design  was  to 
minimize  spills  at  a  reasonable  cost,  their  infrequent  occurrence  justified  an  easily 
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Figure  1:  Register  Allocation  in  a  SOAR  Register  Window 
modifiable,  simple  spilling  strategy. 

The  details  of  our  allocation  and  assignment  strategy  are  straightforward.  The 
compiler  assigns  up  to  four  arguments  and  locals  to  a  method's  high  registers,  and  the 
remaining  high  registers  are  used  for  retained  temporaries.  If  there  are  more  argu¬ 
ments  than  will  fit  in  four  registers,  all  arguments  are  put  in  memory.  If  the  argu¬ 
ments  will  fit  in  four  registers,  but  the  local  variables  will  not  fit  with  the  arguments, 
all  of  the  locals  are  stored  in  memory.  Transitory  temporaries  can  be  stored  in  the 
lows.  Retained  temporaries  are  stored  in  registers  if  any  are  available  at  that  point  in 
the  computation,  and  in  memory  otherwise. 

A  nice  benefit  of  the  above  rules  is  that  for  most  Smalltalk  methods,  code  can  be 
generated  in  one  pass  over  the  bytecodes  by  making  some  initial  assumptions  about 
the  method's  register  requirements.  If  the  assumptions  turn  out  to  be  false,  a  second 
pass  can  be  made  with  the  new  information.  Only  9%  of  all  methods  in  the 


Smalltalk-80  system  require  the  second  pass.3 

Spills  can  be  implemented  in  one  of  two  ways.  First,  space  can  be  allocated  from 
a  common  spill  pool,  or  a  separate  spill  object  can  be  allocated  for  each  activation 
frame  that  spills.  The  former  has  complications  for  garbage  collection  and  processes, 
and  the  latter  eats  up  a  register  in  each  spilling  frame.  Both  techniques  have  been 
tried  in  SOAR;  neither  has  shown  itself  to  be  clearly  superior  over  the  other. 

4.  How  the  SOAR  Architecture  Simplifies  the  Compiler 

Register  windows  and  spills  complicate  the  compiler;  SOAR  architectural  support 
for  Smalltalk  has  in  the  main  simplified  it.  An  important  architectural  feature  allows 
the  compiler  to  generate  standard  arithmetic  and  comparison  instructions  in  spite  of 
the  fact  that  the  operations  may  be  on  non-integer  objects. 

Since  Smalltalk  is  polymorphic  and  variable  types  are  not  known  at  compile 
time,  it  is  never  safe  to  generate  integer  instructions  without  runtime  type  checking. 
The  virtual  machine  requires  a  dynamic  method  lookup  on  each  operator,  including 
*+’.  Implementing  this  lookup  naively  is  very  expensive.  Studies  have  shown  [l  1  ]  that 
in  fact  most  arithmetic  operations  in  Smalltalk  involve  only  integers.  SOAR  takes 
advantage  of  this  fact  by  assuming  all  simple  arithmetic  operations  (plus,  minus,  com¬ 
parisons,  etc.)  will  be  performed  only  on  integers.  The  compiler  thus  treats  all  such 
operations  as  if  they  were  on  integers,  and  generates  integer  code.  If,  say,  an  add 
instruction  is  initiated  on  two  objects,  and  one  or  both  of  them  turn  out  to  be  non¬ 
integers,  the  hardware  will  trap  and  transfer  to  a  handler  that  will  look  up  the  correct 
method  for  the  intended  operation.  When  the  execution  of  the  looked-up  method  is 
complete,  the  trap  handler  returns  to  the  instruction  immediately  following  the  one 
that  caused  the  trap. 

It  would  not  be  difficult  for  the  compiler  to  generate  code  to  test  explicitly  for 
integer  tags,  but  it  would  slow  down  system  performance  by  an  estimated  26%  and 

*The  Smalltalk  and  the  C  versions  of  the  compiler  differ  in  this  regard:  the  C  version  effectively 
makes  two  passes  over  each  method.  However,  speed  was  not  as  critical  for  the  C  version  as  for  the 
Smalltalk  version. 
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would  increase  image  sire  by  15%. 

The  tag  mechanism  is  also  used  to  provide  hardware  assist  for  garbage  collection, 
thus  further  simplifying  the  task  of  code  generation.  SOAR  uses  a  generation  scaveng¬ 
ing  scheme,  dividing  memory  into  regions  of  old  and  new  objects,  and  uses  generation 
tags  on  pointers  to  detect  when  pointers  to  new  objects  are  stored  into  old  ones.  The 
tag  check  is  performed  in  the  hardware,  and  a  trap  handler  records  the  necessary 
information  if  a  trap  is  taken.  Benchmark  results  indicate  that  tagged  stores  are  so 
rare  that  the  compiler  could  in  fact  generate  explicit  tests  and  only  slow  the  system 
about  l°c  and  expand  the  image  2^. 

The  compiler  also  takes  advantage  of  hardware  that  maps  registers  to  memory 
addresses,  and  allows  what  we  call  a  pointer  to  register.  Since  a  Smalltalk  program 
can  access  any  object  in  memory,  and  activation  records  are  just  another  kind  of 
object,  it  is  necessary  to  handle  references  to  them.  Remember  that  activation  records 
exist  as  on-chip  register  windows.  The  pointer  to  register  feature  permits  the  compiler 
to  ignore  complications  that  would  otherwise  be  caused  by  overt  references  to  activa¬ 
tion  records.  Once  again,  however,  it  turns  out  that  checking  for  pointers  to  activa¬ 
tion  records  is  not  as  severe  a  problem  as  it  was  initially  feared  to  be.  The  instances 
requiring  such  a  check  do  not  occur  often,  and  I’ngar  concluded  that  the  feature  could 
be  removed  with  only  a  3^  performance  penalty. 

5.  How  Bytecode  Compilation  Simplifies  the  Compiler 

Both  language  and  pragmatic  considerations  limited  the  optimizations  we  could 
perform. 

First,  Smalltalk  is  a  polymorphic  language.  That  is,  the  same  ‘A+B’  expression 
within  a  Smalltalk  method  could  on  one  instantiation  add  two  integers  and  on  the 
next  concatenate  two  strings.  In  the  latter  case  it  is  not  true  that  ‘A+B’  equals  ‘B+A’. 
Furthermore,  which  method  is  invoked  is,  by  definition,  a  function  of  the  leftmost 
operand  (‘A’  in  our  example).  Together  these  facts  preclude  almost  all  expression 
evaluation  optimization. 
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Second,  the  fact  that  activation  frames  are  full-fledged  Smalltalk  objects  requires 
a  relatively  straightforward  mapping  between  bytecode  frames  and  compiled  ones, 
eliminating  optimizations  such  as  method  integration. 

Third,  our  pragmatic  goal  was  to  bring  up  a  working  Smalltalk-80  image  for  the 
SOAR  processor.  This  made  us  fundamentally  conservative  in  our  approach  to  the 
language.  Correctness,  as  defined  by  the  bytecodes  and  the  image,  was  more  impor¬ 
tant  than  efficiency.  Because  of  this  we  were  loath  to  change  the  image  (including  the 
standard  Smalltalk-80  compiler)  or  take  advantage  of  ambiguities  in  the  language 
specification.  We  were  not  in  the  normal  position  of  a  compiler  writer. 

There  is  one  optimization  which  has  been  employed  successfully  by  Deutsch  and 
Schiffman  and  the  SOAR  system.  While  it  is  true  that  the  language  permits 
polymorphic  expressions,  the  fact  of  the  matter  is  that  90%  of  the  time  the  method 
invoked  does  not  change.  Thus,  for  example,  if  'A+B'  added  complex  numbers  the  last 
time  it  was  evaluated,  it  is  highly  likely  that  it  will  be  used  to  add  complex  numbers 
the  next  time  it  is  evaluated.  Using  this  observation  we  cache  the  last  method  called. 
When  it  is  called  again,  we  merely  check  that  the  type  of  the  leftmost  operand  this 
call  matches  the  type  of  the  leftmost  operand  from  the  last  call.  If  they  do  not  match 
the  standard  method  lookup  mechanism  is  re-invoked.  Ungar  estimated  that  removing 
this  caching  strategy  would  slow  down  our  system  by  33^. 

6.  Results 

Three  types  of  performance  measurements  were  made  with  the  SOAR  compiler: 
compilation  speed,  code  expansion,  and  register  window  utilization. 

Since  the  SOAR  compiler  is  an  extra  stage  added  to  the  bytecode  compiler,  it  can 
only  slow  compilation  down.  However,  on  a  Dorado  (equivalent  in  performance  to 
SOAR)  the  compiler  is  reasonably  fast  both  objectively  and  subjectively.  Results  from 
compiling  the  entire  Smalltalk-80  image  of  4770  methods  indicate  that  it  adds  a  mean 
time  of  50  milliseconds  to  a  method's  total  compilation  time.  This  lengthens  compila¬ 
tion  by  10%.  Subjectively,  the  compiler  does  not  intrude  on  system  use  since  it  is  usu¬ 
ally  invoked  interactively  on  one  method  at  a  time.  A  mean  total  compilation  time  of 
about  120  milliseconds  per  method  does  not  noticeably  affect  response  time. 


One  of  the  virtues  of  bytecodes  is  that  they  are  a  compact  representation.  A 
major  concern  at  the  start  of  the  project  was  an  anticipated  explosion  in  method  size 
that  could  result  from  moving  to  four-byte  word-sized  instructions.  Preliminary  esti¬ 
mates  indicated  potential  expansion  of  up  to  1000%  [12].  Fortunately,  observed 
expansion  with  the  actual  implementation  is  considerably  lower.  For  the  entire 
Smalltalk-80  image  the  mean  bytecode  method  size  is  32.5  bytes;  that  expands  to  a 
mean  length  of  40.6  SOAR  words,4  an  expansion  factor  of  499%.  We  note  here  that 
the  largest  method  compiles  into  637  bytes  of  bytecodes;  it  takes  the  Smalltalk-80 
compiler  and  our  compiler  approximately  3  seconds  to  compile  this  method  from 
Smalltalk  into  684  SOAR  words.  These  expansion  results  are  very  similar  to  the 
expansion  factor  of  503%  reported  by  Deutsch  and  Schiffman  [5]  for  their  comparable 
translator  with  in-line  cache. 

To  put  these  averages  in  perspective,  total  method  storage  required  for  the 
bytecode  image  is  155025  bytes,  and  is  774648  bytes  for  the  SOAR  version.  For 
current  workstations  with  memories  in  the  four  to  eight  megabyte  range,  this  increase 
of  roughly  600k  bytes  is  not  significant. 

A  major  goal  of  the  SOAR  design  was  to  have  over  90%  of  the  methods  fit  all 
their  storage  into  registers.  Static  results  from  the  compiler  support  the  current  win¬ 
dow  size.  Only  9%  of  all  methods  spill.  Examining  total  register  requirements  also  sup¬ 
ports  the  16  register  window.  These  numbers  are  expressed  in  terms  of  the  numbers  of 
high  registers  used,  because  all  method-specific  allocation  is  done  in  the  high  registers.5 


number  of  high  registers 

methods  using  no  more  than  that  number 

- ^ - 

- 

4 

65% 

8 

92% 

16 

99% 

32 

100% 

Dynamic  results  from  benchmarks  confirm  the  static  results,  and  more  emphatically 
The  method  lengths  include  literals  and  in-line  cache  instructions. 

'These  numbers  include  the  2  special  receiver  and  return  address  registers.  Note  that  the  percentage 
of  methods  that  spill  is  higher  than  the  percentage  or  methods  that  require  more  than  8  registers  because 
the  registers  are  divided  up  into  two  regions  (arguments  and  locals,  and  temporaries)  that  have  different 
spill  disciplines. 


endorse  the  chosen  window  size.  Dynamically,  less  than  Z%  of  methods  spill.  The 
above  figures  show  that  if  we  increased  the  register  window  set  size  to  32  from  8,  all 
methods  in  the  Smalltalk-80  system  would  fit  in  register  windows,  and  spilling  could 
effectively  be  eliminated.  However,  since  less  than  3%  of  all  calls  require  spilling,  even 
if  we  could  (somehow)  keep  the  same  number  of  register  windows  on  the  chip,  increas¬ 
ing  the  window  size  from  16  to  64  registers  would  not  be  cost  effective. 

Static  results  also  indicate  that  the  current  spill  rules  are  reasonable. 


category 

mean  size 

maximum  size 

arguments 
local  variables 

03 

0.68 

13 

19 

retained  temporaries 

0.82 

10 

total  registers 

2.32 

28 

spill  area 

0.45 

22 

Argument,  local  variable,  and  temporary  demands  are  small  on  the  average,  but  in  the 
worst  case  all  exceed  the  window  size. 

7.  Conclusions 

The  Smalltalk-80  language  and  its  bytecode  representation  restrict  the  conven¬ 
tional  optimizations  available  to  the  SOAR  compiler.  Nonetheless,  the  compiler  gen¬ 
erates  efficient  code,  primarily  due  to  register  windows,  integer  tags  and  traps,  and 
in-line  method  caches.  Experience  with  the  compiler  has  verified  the  architectural 
design  decision  to  use  a  16-register  fully  overlapped  window.  Several  features  sup¬ 
ported  by  the  SOAR  hardware  could  easily  be  performed  by  the  compiler  at  marginal 
increased  time  and  space  costs. 
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